1 – Introduction

1.1 – Welcome to Foundations of Data Analysis

1.2 – Working as a Team

1.2 – Working as a Team

Whether you decide to take this course with your colleagues or on your own, we know you will benefit from its overall objective to cultivate the practice of integrated data analysis. We recommend that teams determine how they can best learn together. Our goal is to provide you with enough guidance to help you make this learning experience best suit your team's needs.

You will be introduced to a variety of learning materials throughout the four modules of this course. Here are some suggestions for how you might use the course material:

  • Go through the modules individually and schedule a meeting with your team for discussion and reflection
  • Take 5 minutes in your weekly team meeting to share a key takeaway
  • Complete an exercise individually and compare your approach and results with colleagues
  • Post your reflections or questions in your team's Slack or Teams
  • Create a group message thread to freely post and discuss the course material as you work through the modules
  • Think of a data-focused project your team is currently working on and see how you can relate it to what you are learning
  • Use the resources to apply some new thinking or strategy to leverage the data available to you

1.3 – Course Navigation

1.3 – Course Navigation

The course is organized along these categories: Think, Watch, Do, Best Practices, Deep Dive.

The following guiding symbols will facilitate your navigation and will guide you throughout the course content.

  • Best Practices: We will highlight a few key analytical best practices
  • Deep Dive: We will focus on a specific aspect of the analytical process
  • Do: You will do hands-on exercises to put your learning into practice
  • Think: Throughout the course, we will walk you through a real-life analytical scenario to put yourself in the shoes of an analyst and reflect on actions and next steps
  • Watch: You will watch a few videos to learn about the analytical process

1.4 – Module Activities

1.4 – Module Activities

There will be 4 modules:

  1. Planning your analysis
    Explore the guiding principles for analysis, the steps of the analytical process and best practices for planning your analysis.
  2. Preparing and analyzing data
    Best practices for implementing your plan including preparing your data for analysis.
  3. Sharing your findings
    Best practices for interpreting your findings and telling your data story.
  4. Case study example
    Review the steps of the analytical process through an example to gain a better understanding of how analysts go through each step of the analytical process.

2 – Making an Analytical Plan

2.1 – Get Started

2.1 – Getting started

Imagine that your manager has asked you to lead an analytical project.

A client needs information on the educational attainment of different groups within the Canadian population.

How would you get started? Take a moment to think about it, or brainstorm with your team. Write down 1 or 2 first steps in the box below.

2.2 – Making an analytical plan

2.2 – Making an analytical plan

So, how do you get an analytical project started?

Let’s start by watching a first short video: Making an analytical plan (8:13)

This video introduces you to the steps involved in the analytical process and describes the first two steps of the process.

Key points

In the video, you learned that the analytical process can be viewed as a series of steps designed to answer a well-defined question. Once the topic has been defined, the next step is to create an analytical plan. And always incorporate the feedback you receive during the planning stage of your analytical project.

2.3 – Let’s go back to our scenario

The following actions are all part of the first steps of an analytical project. We provide more details for each action in the next pages.

  • Meet with the client to discuss their needs in more detail
  • Search for previously published information to identify what we already know about this topic and identify information gaps
  • Identify a suitable data source which contains the information you need
  • Identify your analytical question
  • Prepare an analytical plan and circulate it for review and approval
2.3.1 – Meet the client

2.3.1 – Meet the client

Meet with the client to discuss their needs in more detail.

It’s important to know where you are going and get buy-in from clients before diving into the analysis. During your meeting with the client, you seek to clarify their needs because their original request is broad. Which population groups are they interested in? What is the desired level of detail or disaggregation? Why do they need this information: how will this information help the client in their decision-making?

You learn that the client is particularly interested in information about the educational attainment of women living in urban versus more rural areas. This topic has relevance for this client since they run a program to support the economic development and growth of rural communities. The relevant age group for this analysis would be people in the core working-age population (25 to 64 years old). The client is also interested in characterizing educational outcomes of different groups of women based on racialized group membership.

2.3.2 – Literature and data

2.3.2 – Literature and data

Search for previously published information. It’s important to understand what is already known on the topic. In your review of the literature, you notice that there is indeed a lack of detailed information on education outcomes for women from different racialized groups by area of residence in Canada. This study would bring significant contribution to the existing body of knowledge on educational outcomes of different groups of population within Canada.

Identify a suitable data source. After careful consideration and with the agreement of the client, you have decided to use the Census of the Population (Long Form) as your data source because of its representativity and large sample size. Information on highest level of educational attainment, population group, gender, area of residence, and more! are all available in the census. As a StatCan employee, you have access to the raw data.

2.3.3 – Analytical question

2.3.3 – Analytical question

You now have a better grasp of your analytical objective. Can you articulate the overall research question for your project?

Identify your analytical question

Take a moment to think about it, or brainstorm with your team. Write your question in the box below.

2.3.4 – Principle of relevance

2.3.4 – Principle of relevance

It is always important to define the value that your analysis adds either to your organization, your client, or to our understanding of the topic. Why is your question relevant? Why should we care about your work?

  • When planning your analysis, make sure that you are addressing issues of importance to policymakers, stakeholders, data users, and/or the general population.
  • Your analysis should contribute to a better understanding of current and emerging issues.
  • Another way to think about relevance is that your analysis should help decision-making—big or small.
INSERT QUIZ HERE
2.3.5 – Analytical plan

2.3.5 – Analytical plan

Prepare an analytical plan and circulate it for review and approval.

You are now ready to put together a one-pager describing your analytical plan for review and approval (add a template). The analytical plan should contain a short description of the background of the project along with your research questions or objectives, the data source(s) that will be used, the planned methodology, timelines, etc.

It’s a good idea at this point in the process to take the time to create a template table which, when populated with data, will allow you to provide answers to your research question. If you can’t translate your question into a data table, you probably can’t answer your question with the available data! Carrying out your analysis will be easier if you have mock-up tables to fill.

Think

Take a moment to think about how your table would look like. Take a pen and a piece of paper and try to create your (empty) table template.

2.4 – You are ready!

2.4 – You are ready!

Your research question is:

What are the educational outcomes of women living in urban versus more rural areas, and are there differences by racialized group membership?

Here is a good table template to address your research question:

This table would allow you to compare the proportion of women with no diploma; a high school diploma; and postsecondary education, for all women; for women from the total racialized population; and also separately for the 6 largest racialized groups in Canada (South Asian, Chinese, Black, Filipino, Arab, and Latin American), separately for those living in urban and those living in rural areas.

You are ready to move onto the next steps of the project!

3 – Preparing and analyzing data

3.1 – Implement your plan

3.1 – Implement your plan

With your manager and your client, you have identified a relevant research question to address along with an appropriate data source to leverage, and you have received feedback (and approval to go ahead) on your analytical plan.

What would be your next steps? Take a moment to think about it, or brainstorm with your team.

Write down one or two next steps in the box below.

3.2 – Implementing the analytical plan

3.2 – Implementing the analytical plan

Now that you have learned how to plan an analytical project, we will discuss best practices for preparing and analyzing your data.

To complete the activity, watch the video: Implementing the analytical plan (6:11).

This video will take you through the third and fourth steps of the analytical process.

Key points

In the video, you learned that before diving into the analyses, it is important to take the time to prepare and check your data.

3.3 – Let’s go back to our scenario

Let’s go back to our scenario. The following actions would all be part of the data preparation step. We provide more details for each action in the next pages.
  • Define your concepts
  • Finalize your template table
  • VIMO: check for valid, invalid, missing, and outlier values
  • Try to reproduce numbers that have been previously published with your data source or a similar data source
  • Summarize and check your results as you go

3.3.1 – Define your concepts

3.3.1 – Define your concepts

Taking the time to define your concepts using the appropriate standards is a crucial part of the analytical process. Looking back at your research question, which concepts will you need to define?

Your research question is:

What are the educational outcomes of women living in urban versus more rural areas, and are there differences by racialized group membership?

Take a moment to think about it, or brainstorm with your team. Write down which concepts you will need to define in the box below.

3.3.2 – Define your concepts (cont’d)

3.3.2 – Define your concepts (cont’d)

For your project, you need to define your concepts of educational outcome, women, urban vs. rural areas of residence, and racialized group membership. You consult the 2021 Census dictionary for standards and definitions.

  • The 2021 Census measured “Highest certificate, diploma or degree”; you decide to analyse this variable using three categories: those without any diploma, those with a high school diploma; and those with postsecondary education. Going back to the Census dictionary, you notice that there are 13 categories that were collected. You therefore prepare your data and combine some of the categories to create your 3 groups (no certificate, diploma, or degree [first category]; a high school diploma [second category]; and postsecondary education [all other categories combined]).
  • You decide to define “women” using the gender variable.
  • Urban and rural areas can be conceptualized using the “Population center” variable. According to the to the Census dictionary, population centres are classified into three groups, depending on the size of their population: small, medium, and large. All areas outside population centres are classified as rural areas. Furthermore, there is a classification that groups all areas into very remote, remote, less accessible, accessible, and easily accessible areas (add the link/reference). You decide to use the concept of remoteness for your analysis.
  • The 2021 Census does not measure whether a person is from a racialized group, but it does measure the concept of “Visible minority”. The visible minority population consists of many groups, and you decide to focus on the six largest groups: South Asian, Chinese, Black, Filipino, Arab, and Latin American. You also decide to produce estimates for the total visible minority population.

3.3.3 – Finalize your template tables

3.3.3 – Finalize your template tables

Now that you have defined your concepts using census standards, you can finalize your table template to reflect your decisions.

Here is a good table template reflecting your decisions:

3.3.4 – VIMO: check for valid, invalid, missing, and outlier values

3.3.4 – VIMO: check for valid, invalid, missing, and outlier values

Assessing the accuracy and validity of your data is an important part of the analytical process. It is therefore important to double-check your dataset to identify any values that appear invalid or somehow missing, and that could mislead your analyses.

In the following video, we present methods to describe accuracy in terms of validity and correctness. We also discuss methods to validate and check the accuracy of data values.

To complete the activity, watch the video: Data Accuracy and Validation: Methods to ensure the quality of data (10:29)

3.2 – Activity: Preparing data for analysis

3.3.5 – Preparing data for analysis

Taking the time to understand and verify the dataset(s) you will work with is a crucial part of the analytical process.

It can be as simple as double-checking your dataset(s) to identify any values that appear invalid or somehow missing, and that could mislead your analyses.

Take a look at the dataset below. There are some problematic values, which can be classified as:

  • Invalid data: values that are impossible and/or do not make sense
  • Missing data: the variable is left blank
  • Outlier data: values that are actually true, but are extremely small or extremely large compared to what we would expect

Can you spot problematic values?

Age in years

Gender

Visible Minority

Person 1

34

Man

Yes

Person 2

102

Canada

Person 3

56

Woman

No

Person 4

999

Woman

Yes

3.3.6 – Reproducing numbers previously published

3.3.6 – Reproducing numbers previously published

Best Practices

Try to reproduce numbers that have been previously published with your data source or a similar data source.

Even though you did not find any information on your specific topic, you did find previously published information on education outcomes of men and women using the census. You double-check if your proportions of women by highest level of education attained match the ones that have been published.

3.3.7 – Analyzing data

3.3.7 – Analyzing data

Now that you have familiarized yourself with your data set, and have cleaned and prepared it for analysis, it seems like you are finally ready to analyze your data.

How would you get started with your analysis?

Analyzing data is the step where data turns into information. This is where it gets interesting: you are finally looking for answers to your analytical questions.

This step should be straightforward if you identified clear questions to address and created table templates for each question.

This is where having a clear analytical plan comes in handy. One by one, you will go through your questions and produce tables that shed light on what you are investigating.

Be purposeful and intentional: remember that you are not on a fishing expedition, and that no single project can provide all the answers.

3.3.8 – Proportions, ratios and rates

3.3.8 – Proportions, ratios and rates

In your project, you would have one table template to fill with data. More specifically, you must calculate the proportion of women without any diploma; with a high school diploma; and with postsecondary education, by remoteness level, and by membership into racialized groups.

Deep dive. What are proportions? A complete review of actual data analysis techniques is out of scope for this course. However, often the easiest way to analyze data is to simply compare one given number with another. Watch the following video where you will be introduced to the basic concepts of proportions, ratios, and rates.

To complete the activity, watch the video: Proportions, ratios and rates (13:16)

3.3.9 – Finding patterns in the data

3.3.9 – Finding patterns in the data

You have calculated the proportion of women without any diploma; with a high school diploma; and with postsecondary education, for women overall and women from racialized groups, by areas of residence in Canada.

Take a look at the table here:

Looking only at the data for women overall and women from racialized groups, what patterns start to emerge?

Take a moment to think about it, or brainstorm with your team. Write down a few patterns that start to emerge in the box below.

3.3.10 – Summarize and check your results as you go

3.3.10 – Summarize and check your results as you go

Summarize your results as you go: It is a good idea to write down a few bullets besides your table(s) as you start to populate them to summarize the message. For instance, when looking at the data for women overall, we can say that…

  • the lowest proportion of women with postsecondary education is observed among women living in very remote areas (40%) (highlight in the table)
  • the highest proportion of women with postsecondary education is observed among women living in easily accessible areas (69%) (highlight in the table)

Check your results as you go: Don’t forget to check for the quality of the estimates that you are producing. One important aspect to verify is the sample size behind each table cell. This is especially important if you are disaggregating your data at very fine levels of details. Estimates based on small sample sizes will be associated with more variability (or uncertainty), and in some cases, should not even be published.

3.4 – Let’s go back to our scenario

3.4 – Let’s go back to our scenario

For your project, you planned to disaggregate your data on educational outcomes for 5 area types (ranging from easily accessible areas to very remote areas), and for 6 racialized groups (South Asian, Chinese, Black, Filipino, Arab, and Latin American). Can your data support this level of disaggregation?

You examine your data and notice that you will run into sample size issues for women living in very remote areas. There are very few South Asian, Chinese, Black, Filipino, Arab, and Latin American women living in the most remote regions of Canada. Based on Census confidentiality rules, you will have to suppress all data points for these women for “No certificate, diploma or degree” and “High school diploma”. The only publishable data point for these women is the category “Postsecondary education”.

After much data crunching, you have filled out your table template. You are ready to move onto the next steps of the project!

4 – Sharing your findings

4.1 – Summarize, interpret and disseminate

4.1 – Summarize, interpret and disseminate

Now that we've learned how to plan and implement an analytical project, we will discuss best practices for summarizing and sharing your findings.

To complete the activity, watch the video: Sharing your findings (11:38)

This video will take you through the last two steps of the analytical process.

Key points

In the video, you have learned about the importance of interpreting your findings using clear and neutral language, and to stay true to your analytical question while telling your data story.

Remember

Analyzing data is the step where you turn the data into information. This is where it gets interesting: you are finally looking for answers to your analytical questions. Answers to your questions can be expressed as key messages.

4.2 – Let’s go back to our scenario

4.2 – Let’s go back to our scenario

You have now produced your table to provide information on the educational attainment of women from different areas of residence and from different racialized groups within Canada. It seems like you are now ready to summarize your findings.

Let’s practice the art of extracting key messages from tables. Here is your table template:

Looking at your complete data table, what would be the main messages? Please express them in plain and neutral language. Take a moment to think about it, or brainstorm with your team.

4.3 – Extracting key messages

4.3 – Extracting key messages

One way to get started can be to first focus on the patterns for women overall (highlight first data column). How are the different levels of education distributed across areas of residence for women overall?

One finding that pops out is that the lowest proportion of women with postsecondary education is observed among women living in very remote areas (40%) (highlight in the table), while the highest proportion of women with postsecondary education is observed among women living in easily accessible areas (69%) (highlight in the table).

Let’s turn to patterns for women from racialized groups (highlight second data column). What is popping?Well, interestingly, it looks like the highest proportion of racialized women with postsecondary education is observed among those living in very remote areas (80%) (highlight in the table).

How about women from specific groups (highlight data columns for South Asian, Chinese, Black, Filipino, Latin American, and Arab)? Which groups follow the same pattern? Which groups show a different pattern?

Pause and take a moment to investigate.

4.4 – Principle of neutrality

4.4 – Principle of neutrality

Always aim to express your key messages objectively. Let’s take a deep dive into an important guiding principle for analysis: Neutrality

  • The principle of neutrality means that you should aim for an impartial presentation of your results.
  • Neutrality means that we let the data speak for themselves.
  • Ensure that you’re maintaining neutrality by using plain language and not overstating your results or speculating when interpreting them.
  • Avoid subjective qualifiers. For example:
      Subjective: A massive proportion of Canadians increased their consumption of harmful sugary snacks in 2021.
      Neutral: The proportion of Canadians who reported consumption of snacks with high sugar content went from x% in 2016 to x% in 2021.

4.5 – Telling the data story

4.5 – Telling the data story

You are now ready to prepare your work for dissemination and communicate your findings!

Presenting your findings clearly to others is one of the most challenging aspects of the analytical process. Let’s discuss in more details the idea of using data to tell a story.

In the next video, we will talk about storytelling and describe the different components of a data story, including the data; the narrative; and the visualizations. We will also discuss how each component can be used to construct concise, informative, and engaging messages your audience will remember.

To complete the activity, watch the video: Telling the data story: How to create stories that matter (12:35)

Key points

In the video, you learned that the three most important components of a data story were: the data, the narrative, and the visualizations. We also talked about the importance of planning your data story by first determining who your audience is, what the goal of the story should be, and how it might be best presented.

 4.6 – Let’s go back to our scenario

4.6 – Let’s go back to our scenario

You have analyzed your data table and carefully selected the key findings that provided an answer to your research questions. The goal of your data story was to describe how educational outcomes for women varied by areas of residence and membership in racialized groups.

You decide to share your findings via two formats: a short research report and a presentation. The goal of the research report is to inform the client that education outcomes for women vary by area of residence and by membership in racialized groups. The report contains all the methodological information that someone with a fair amount of data literacy would need. The presentation, on the other hand, is a more visual and engaging summary to quickly disseminate the storyline. Both formats have met their goal of informing their audience accordingly.

4.7 – Data visualization

4.7 – Data visualization

Deep dive. As we saw in the previous video, data visualizations are a very important part of your data story. The next video will give you a better understanding of data visualizations and how they can be used to present data in an engaging and aesthetically pleasing way.

Interested to learn more? Watch the video: Data Visualization: An introduction (10:54)

4.8 – Let’s go back to our scenario

4.8 – Let’s go back to our scenario

Can you think of effective, simple graphs that would help make your key messages pop?

Take a moment to think about it, or brainstorm with your team. Feel free to use Excel to create 1 or 2 charts.

Going back to when we were comparing the proportions of women with postsecondary education by areas of residence and racialized group membership, this is an example of a chart that would make the observed patterns pop:

4.9 – Don’t forget to seek feedback on your work before disseminating your findings!

4.9 – Review process

Don’t forget to seek feedback on your work before disseminating your findings!

Remember: your work should go through an extensive review process before being considered “Final”. You can request feedback from a range of people such as colleagues, managers, subject matter experts and data or methodology experts. Ask your reviewers for feedback on different aspects of your work, such as the clarity of your analytical objectives, appropriateness of the data you've used, definition of concepts, review of literature, methodological approach, interpretation of your results, and clarity and neutrality of your writing.

In the next Module, we will put it all together and walk you through a case study to give you an overview of how an analyst has gone through all the steps of the analytical process in their project.

5 – Case Study Example

5.1 – Walkability in neighbourhoods

5.1 – Walkability in neighbourhoods

In this video, we will walk you through an example to give you an overview of how an analyst has gone through all the steps of the analytical process in their project.

To complete the activity, watch the video: Analysis 101, part 4: Case study (9:01)

6 - Test your Knowledge

7 – Conclusion

7.1 – Congratulations

7.1 – Congratulations!

You've reached the end of the Foundations of data analysis: steps of the analytical process course.

What comes next?

Help us improve future courses and let us know what you thought of this one, by completing the course evaluation survey that you will receive by email.

7.2 – Additional Resources